--- title: "GLM in R" author: "Oliver and Thurber" date: ' ' output: word_document: default html_document: default --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) # require(pacman) makes the previously installed pacman package manager available require(pacman) ``` ## Data Entry The following code creates a sample data frame that we will use in our examples. ```{r makedata, echo=TRUE} # Import the height and weight data htwt = read.csv("http://facweb1.redlands.edu/fac/jim_bentley/downloads/math111/htwt.csv") # The Group variable is actually categorical and not numeric htwt$Group = factor(htwt$Group,levels=c(1,2),labels=c("Male","Female")) head(htwt) ``` ## Testing The Population Mean ### The One Sample Test A simple test for the population mean of the **Weight** variable in the **htwt** data can be obtained via the **t.test** function. To compute the one sample t-test of H0: mu=145 we enter: ```{r ttest145, echo=TRUE} t.test(htwt$Weight, mu=145, alternative='two.sided', conf.level=.95) ``` An equivalent test of H0: mu=145 may be carried out using a linear model via the **lm** function. ```{r lmmean145, echo=TRUE} summary(lm((Weight-145)~1, data=htwt)) ``` Notice that adding the coefficient from the model to the hypothesized mean gives the sample mean. That is 145+(-5.4)=139.6. Note, too that the p-values computed by **t.test** and **lm** are the same (p=0.582). ### The Two Sample Test A simple test to compare the male and female population means of the **Weight** variable in the **htwt** data can also be obtained via the **t.test** function. To compute the two sample t-test of H0: mu.m = mu.f we enter: ```{r ttestmf, echo=TRUE} t.test(Weight~Group, alternative='two.sided', conf.level=.95, var.equal=TRUE, data=htwt) ``` An equivalent test of H0: beta1 = 0 = mu.m-mu.f may be carried out using a linear model via the **lm** function. ```{r lmmeanmf, echo=TRUE} htwt.lm.Group = lm(Weight~Group, data=htwt) summary(htwt.lm.Group) plot(htwt$Group,htwt$Weight) points(c(1,2),by(htwt[,2],htwt[,3],mean),type="b") par(mfrow=c(2,2)) plot(htwt.lm.Group) par(mfrow=c(1,1)) ``` Notice that intercept term (155) is the sample mean for the males. The sample mean for the females is the model evaluated for a female (155+(-28)=127). As in the one sample problem the p-values computed by **t.test** and **lm** are the same (p=0.153). ### Correcting for Height It is fairly clear from graphing **Weight** as a function of **Height** that when modeling a person's weight we should correct for height. While this cannot be accomplished using a t-test, a linear model makes the correction fairly easy. To test for H0: beta.1 = 0 when controlling for **Height** using the model Weight = beta.0 + beta.1 Female + beta.2 Height + epsilon we compute ```{r lmmfht, echo=TRUE} summary(lm(Weight~1+Group+Height, data=htwt)) plot(htwt$Height,htwt$Weight,pch=as.numeric(htwt$Group),xlab="Height",ylab="Weight") htwt.lm.HG=lm(Weight~Height+Group,data=htwt) abline(coef(htwt.lm.HG)%*%cbind(c(1,0,0),c(0,1,0)),lty=1) abline(coef(htwt.lm.HG)%*%cbind(c(1,0,1),c(0,1,0)),lty=2) legend(55,200,legend=c("Male","Female"),pch=c(1,2),lty=c(1,2)) ``` Notice that as before there does note appear to be a difference between females and males (p=0.655). However, it is clear that **Height** is predictive of **Weight** (p<0.001). ### Interaction Terms At this point we may be convinced that no differences exist in the weights of our two groups. Clearly the means for this sample are not significantly different. A little more insight may be gained by including an interaction term. We now fit the model Weight = beta.0 + beta.1 Female + beta.2 Height + beta.3 Female x Height + epsilon ```{r lmmfhtint, echo=TRUE} htwt.lm.GbyH = lm(Weight~1+Group*Height, data=htwt) summary(htwt.lm.GbyH) # Using lattice p_load(lattice) xyplot(Weight~Height, group=Group,data=htwt,type=c("p","r"), pch=c(1,3), col=c(5,6), key=list(text=list(lab=c("Male","Female")), lines=list(pch=c(1,3),lty=c(1,2),col=c(5,6),type="b"))) # Using ggplot p_load(ggplot2) ggplot(htwt, aes(x=Height, y=Weight, color=Group, shape=Group))+geom_point()+geom_smooth(method=lm, se=FALSE, fullrange=FALSE) ``` It is now clear that not only is height predictive of weight (p<0.0001), more importantly, females and males put weight on differently. Since the interaction term is significant (p=0.0274) this indicates that their slopes are different with the women putting on about one pound less per inch than the men. Diagnostic plots can be generated by using the **plot** function on the **lm** object, **htwt.lm.GbyH**. The figure shows the four diagnostic plots that are the default. An analysis of variance table may also be generated. ```{r lmdiag, echo=TRUE} # Set up the page to take all four images par(mfrow=c(2,2)) plot(htwt.lm.GbyH) par(mfrow=c(1,1)) anova(htwt.lm.GbyH) ``` The question of mean differences is thus shown to be the wrong question. The investigator should have been looking to see if men and women put on an equivalent number of pounds for each inch difference in height. This is something that is not apparent when looking at t-tests.